A Bayesian semi-parametric approach to cluster heterogeneous time series
نویسنده
چکیده
Majority of time series clustering research is focused on calculating similarity metrics between individual series, which in conjunction with traditional clustering algorithm partitions the data into similar groups (clusters). A major challenge lies in obtaining partitions when the number of clusters is not known in advance. Another challenge in such a clustering problem is to apply known hierarchies and heterogeneities in the data to refine clustering. In this work, we aim to alleviate both challenges using a Bayesian semi-parametric model for clustering of time series from data sources with known heterogeneity (e.g. different sensors). We apply a hierarchical normal model to represent the heterogeneity of sensor types within the sensor network and a dynamic linear model to represent the time series. The clustering is based on subset of parameters of the dynamic linear model, rest of the parameters are used for incrementing the likelihood of data to the model. We present a Gibbs Sampling algorithm to train the model and learn its parameters. We illustrate our approach with a dataset of EMG measurements recorded during different trials of locomotion. Our evaluation shows that the model learns significant clusters present within the data, filtering out the variances resulting from heterogeneity and random noise.
منابع مشابه
Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses
Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...
متن کاملUnsupervised Modeling of Patient-Level Disease Dynamics
To provide insight into patient-level disease dynamics from data collected at irregular time intervals, this work extends applications of semi-parametric clustering for temporal mining. In the semi-parametric clustering framework, Markovian models provide useful parametric assumptions for modeling temporal dynamics, and a non-parametric method is used to cluster the temporal abstractions instea...
متن کاملDirichlet Mixtures of Bayesian Linear Gaussian State-Space Models: a Variational Approach
We describe two related models to cluster multidimensional time-series under the assumption of an underlying linear Gaussian dynamical process. In the first model, times-series are assigned to the same cluster when they show global similarity in their dynamics, while in the second model times-series are assigned to the same cluster when they show simultaneous similarity. Both models are based o...
متن کاملBayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects
This paper develops a semi-parametric Bayesian regression model for estimating heterogeneous treatment effects from observational data. Standard nonlinear regression models, which may work quite well for prediction, can yield badly biased estimates of treatment effects when fit to data with strong confounding. Our Bayesian causal forests model avoids this problem by directly incorporating an es...
متن کاملDifferential and trajectory methods for time course gene expression data
MOTIVATION The issue of high dimensionality in microarray data has been, and remains, a hot topic in statistical and computational analysis. Efficient gene filtering and differentiation approaches can reduce the dimensions of data, help to remove redundant genes and noises, and highlight the most relevant genes that are major players in the development of certain diseases or the effect of drug ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015